Dynamic Time Warping
   HOME

TheInfoList



OR:

In
time series analysis In mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in m ...
, dynamic time warping (DTW) is an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were
acceleration In mechanics, acceleration is the rate of change of the velocity of an object with respect to time. Accelerations are vector quantities (in that they have magnitude and direction). The orientation of an object's acceleration is given by the ...
s and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well-known application has been automatic
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
, to cope with different speaking speeds. Other applications include
speaker recognition Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to ''speaker recognition'' or speech recognition. Speaker verification ...
and online
signature recognition Signature recognition is an example of behavioral biometrics that identifies a person based on their handwriting. It can be operated in two different ways: Static: In this mode, users write their signature on paper, and after the writing is compl ...
. It can also be used in partial
shape matching A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie on ...
applications. In general, DTW is a method that calculates an optimal match between two given sequences (e.g.
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
) with certain restriction and rules: * Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa * The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match) * The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match) * The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i are indices from the first sequence, then there must not be two indices l > k in the other sequence, such that index i is matched with index l and index j is matched with index k, and vice versa The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values. The sequences are "warped"
non-linear In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other ...
ly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
to hold. In addition to a similarity measure between the two sequences, a so called "warping path" is produced, by warping according to this path the two signals may be aligned in time. The signal with an original set of points ''X''(original), ''Y''(original) is transformed to ''X''(warped), ''Y''(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section. This is conceptually very similar to the
Needleman–Wunsch algorithm The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul ...
.


Implementation

This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d(x, y) is a distance between the symbols, e.g. d(x, y) = , x - y , . int DTWDistance(s: array ..n t: array ..m where DTW
, j The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/code> is the distance between s :i/code> and t :j/code> with the best alignment. We sometimes want to add a locality constraint. That is, we require that if s /code> is matched with t /code>, then , i - j , is no larger than w, a window parameter. We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if , n - m , is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that , n - m , \le w (see the line marked with (*) in the code). int DTWDistance(s: array ..n t: array ..mmark>, w: int)


Warping properties

The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matching warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements.


Complexity

The time complexity of the DTW algorithm is O(NM), where N and M are the lengths of the two input sequences. The 50 years old quadratic time bound was recently broken, an algorithm due to Gold and Sharir enables computing DTW in O(/\log \log(N)) time and space for two input sequences of length N. This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form O(N^) for some \epsilon > 0 cannot exist unless the Strong exponential time hypothesis fails. While the dynamic programming algorithm for DTW requires O(NM) space in a naive implementation, the space consumption can be reduced to O(min(N,M)) using
Hirschberg's algorithm In computer science, Hirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the optimal sequence alignment between two strings. Optimality is measured with the Levenshtein distance, define ...
.


Fast computation

Fast techniques for computing DTW include Early Abandoned and Pruned DTW, PrunedDTW, SparseDTW, FastDTW, and the MultiscaleDTW. A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh, LB_Improved, LB_Enhanced, LB_Webb or LB_Petitjean. However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective. In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient. Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute. LB_Petitjean is the tightest known lower bound that can be computed in linear time.


Average sequence

Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to the one of the
multiple alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
and requires heuristics. DBA is currently a reference method to average a set of sequences consistently with DTW. COMASA efficiently randomizes the search for the average sequence, using DBA as a local optimization process.


Supervised learning

A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure.


Alternative approaches

In
functional data analysis Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional ...
, time series are regarded as discretizations of smooth (differentiable) functions of time. By viewing the observed samples at smooth functions, one can utilize continuous mathematics for analyzing data. Smoothness and monotonicity of time warp functions may be obtained for instance by integrating a time-varying
radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...
, thus being a one-dimensional
diffeomorphism In mathematics, a diffeomorphism is an isomorphism of smooth manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are differentiable. Definition Given two m ...
. Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements. Another related approach are
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s (HMM) and it has been shown that the
Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especiall ...
used to search for the most likely path through the HMM is equivalent to stochastic DTW. DTW and related warping methods are typically used as pre- or post-processing steps in data analyses. If the observed sequences contain both random variation in both their values, shape of observed sequences and random temporal misalignment, the warping may overfit to noise leading to biased results. A simultaneous model formulation with random variation in both values (vertical) and time-parametrization (horizontal) is an example of a
nonlinear mixed-effects model Nonlinear mixed-effects models constitute a class of statistical models generalizing mixed model, linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within t ...
. In human movement analysis, simultaneous nonlinear mixed-effects modeling has been shown to produce superior results compared to DTW.


Open-source software

* Th
tempo
C++ library with Python bindings implements Early Abandoned and Pruned DTW as well as lower bounds LB_Keogh, LB_Enhanced and LB_Webb. * Th
UltraFastMPSearch
Java library implements the UltraFastWWSearch algorithm for fast warping window tuning. * Th
lbimproved
C++ library implements Fast Nearest-Neighbor Retrieval algorithms under the GNU General Public License (GPL). It also provides a C++ implementation of dynamic time warping, as well as various lower bounds. * Th
FastDTW
library is a Java implementation of DTW and a FastDTW implementation that provides optimal or near-optimal alignments with an ''O''(''N'') time and memory complexity, in contrast to the ''O''(''N''2) requirement for the standard DTW algorithm. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution.
FastDTW fork
(Java) published to Maven Central.
time-series-classification
(Java) a package for time series classification using DTW in Weka. * Th
DTW suite
provides Python
dtw-python
and R packages
dtw
with a comprehensive coverage of the DTW algorithm family members, including a variety of recursion rules (also called step patterns), constraints, and substring matching. * The
mlpy mlpy is a Python, open-source, machine learning library built on top of NumPy/SciPy, the GNU Scientific Library and it makes an extensive use of the Cython language. mlpy provides a wide range of state-of-the-art machine learning methods for supe ...
Python library implements DTW. * Th
pydtw
Python library implements the Manhattan and Euclidean flavoured DTW measures including the LB_Keogh lower bounds. * Th
cudadtw
C++/CUDA library implements subsequence alignment of Euclidean-flavoured DTW and ''z''-normalized Euclidean distance similar to the popular UCR-Suite on CUDA-enabled accelerators. * Th
JavaML
machine learning library implement
DTW
* Th
ndtw
C# library implements DTW with various options.
Sketch-a-Char
uses Greedy DTW (implemented in JavaScript) as part of LaTeX symbol classifier program. * Th
MatchBox
implements DTW to match mel-frequency cepstral coefficients of audio signals.
Sequence averaging
a GPL Java implementation of DBA. * Th
Gesture Recognition Toolkit, GRT
C++ real-time gesture-recognition toolkit implements DTW. * Th
PyHubs
software package implements DTW and nearest-neighbour classifiers, as well as their extensions (hubness-aware classifiers). * Th
simpledtw
Python library implements the classic ''O''(''NM'') Dynamic Programming algorithm and bases on Numpy. It supports values of any dimension, as well as using custom norm functions for the distances. It is licensed under the MIT license. * Th
tslearn
Python library implements DTW in the time-series context. *Th
cuTWED
CUDA Python library implements a state of the art improved
Time Warp Edit Distance Time Warp Edit Distance (TWED) is a measure of similarity (or dissimilarity) for discrete time series matching with time ' elasticity'. In comparison to other distance measures, (e.g. DTW (dynamic time warping) or LCS (longest common subsequenc ...
using only linear memory with phenomenal speedups.
DynamicAxisWarping.jl
Is a Julia implementation of DTW and related algorithms such as FastDTW, SoftDTW, GeneralDTW and DTW barycenters.


Applications


Spoken-word recognition

Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis, which needs to be eliminated. DP matching is a pattern-matching algorithm based on dynamic programming (DP), which uses a time-normalization effect, where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid of their timing differences by warping the time axis of one so that the maximal coincidence is attained with the other. Moreover, if the warping function is allowed to take any possible value, distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.


Correlation power analysis

Unstable clocks are used to defeat naive
power analysis Power analysis is a form of side channel attack in which the attacker studies the power consumption of a cryptographic hardware device. These attacks rely on basic physical properties of the device: semiconductor devices are governed by the ...
. Several techniques are used to counter this defense, one of which is dynamic time warping.


Finance and econometrics

Dynamic time warping is used in finance and econometrics to assess the quality of the prediction versus real-world data.


See also

*
Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-charact ...
*
Elastic matching Elastic matching is one of the pattern recognition techniques in computer science. Elastic matching (EM) is also known as deformable template, flexible matching, or nonlinear template matching. Elastic matching can be defined as an optimization pro ...
*
Sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
*
Multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
*
Wagner–Fischer algorithm In computer science, the Wagner–Fischer algorithm is a dynamic programming algorithm that computes the edit distance between two strings of characters. History The Wagner–Fischer algorithm has a history of multiple invention. Navarro lists the ...
*
Needleman–Wunsch algorithm The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul ...
*
Fréchet distance In mathematics, the Fréchet distance is a measure of similarity between curves that takes into account the location and ordering of the points along the curves. It is named after Maurice Fréchet. Intuitive definition Imagine a person traversin ...
*
Nonlinear mixed-effects model Nonlinear mixed-effects models constitute a class of statistical models generalizing mixed model, linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within t ...


References


Further reading

* Pavel Senin
Dynamic Time Warping Algorithm Review
* * * * * * {{cite journal , title=Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping , author=Rakthanmanon, Thanawin , journal=ACM Transactions on Knowledge Discovery from Data , date=September 2013 , volume=7 , issue=3 , pages=10:1–10:31 , doi=10.1145/2513092.2500489, pmid=31607834 , pmc=6790126 Dynamic programming Articles with example pseudocode Machine learning algorithms Multivariate time series